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same place in the viral sequence. Analysis of the effects of 
mutations near the ends of the LTRs on integration in vivo 
and in vitro show that this specificity is provided both by 
recognition of a specific sequence containing the inverted 
repeat at each end and by proximity of this sequence to the 
ends of the viral DNA (141,774). This sequence, some- 
times called att, probably forms the only signal required 
in cis by the viral integration machinery. Mutational analy- 
sis has narrowed the sequence required to as few as 6 bases 
from the ends of the LTR (92,527,660). 

From the perspective of the cellular target, most ap- 
oroaches to the problem suggest that integration is a much 
more random process. Comparison of cellular sequences 
flanking a modest number of independent integration events 
(12 to 20 in most cases studied) reveal no common se- 
quence which might serve as a cellular target. Similarly 
analysis of the distribution of integration events into target 
DNA in vitro is not suggestive of any specificity for a par- 
ticular site (81), although it may be possible to define loose 
"consensus" sequences by analyzing large numbers oi inte- 
' gration joints (223). More incisive analyses using ; PCR to 
determine the pattern of integration into small defined re- 
gions of target DNA reveals that integration targets can be 
found in all regions examined, but that there is a decided- 
ly nonrandom pattern of usage of specific sites within any 
region (395,597). This pattern differs from one virus to an- 
other (Y. Kitamura and J.M. Coffin, unpublished obser- 
vations) and seems to reflect the interaction of the mte- 
grase system with local structural features. Incorporation 
of a DNA target into chromatin does not block integration 
but does alter its specificity in a striking way, such that in- 
tegration targets appear with 10-base periodicity where mi- 
crosomes are present (599). Similarly, integration is not 
inhibited by C-methylation of DNA, and this modification 
can even create strong target sites (395). Strong target sites 
for integration in vitro can also be created by introducing 
bends into the target DNA (596). The role of such features 
in the infected cell remains to be examined. 

Another approach to the issue of integration specificity 
is to compare sites of integration selected for insertion into 
the same general region, e.g., in tumors induced by activa- 
tion of proto-oncogenes. This type of analysis can be com- 
plicated by selection for integration in specific regions by 
effects on cell growth. In B-cell lymphomas induced by ALV 
inoculation of chickens (see below), the large majority oi 
tumors have an ALV provirus inserted within the first intron 
of the c-myc protooncogene (259,628). Since proviruses have 
not been found in this region in any other experiment, there 
is no reason to believe that integration into it is especially 
favored. Their presence is a consequence of selection of cells 
transformed by provirus alteration of c-myc expression. Se- 
lection for proviruses integrated in specific regions can also 
be more subtle, as a consequence of reduced growth factor 
dependence of tissue culture cells (746), for example. For 
this reason, inferences regarding integration targeting are 
reliably drawn only from analysis of newly infected cells. 



Several reports that suggest a tendency for integration 
to occur in regions of DNA that might tend to be tran- 
scriptionally active, e.g., regions characterized by rela- 
tively "open" chromatin structure. Proviruses cloned from 
MLV-infected cells were often found to be integrated in 
the vicinity of active genes or DNase-sensitive sites which 
often mark actively transcribed genes (631,648,772), and 
the wye-associated ALV-proviruses mentioned above 
showed a tendency to be located in proximity to one of 
five DNase-hypersensitive sites in c-myc (628). Another 
level of specificity has been reported for a fraction of in- 
tegrations of ALSV into avian cell DNA (662), which sug- 
gested that some 20% of integrations are into one of about 
1,000 specific sites. 

A more recent analysis which makes use of PCR to de- 
tect and localize single integration events in large cultures 
of cells yields a rather different picture of the specificity 
issue (807). Targets for ALSV integration in newly infect- 
ed cells were found to be distributed very much like tar- 
gets in vitro. All tested regions of the genome contained 
integration sites, with very local hot spots. On average, the 
frequency of targets per region was like that expected on 
a purely random basis, suggesting the absence of any strong 
regional targeting. 



Mechanism of Integration 

Studies of the mechanism of integration have been made 
possible by the availability of powerful in vitro systems. 
The first of these consisted simply of extracts of cells made 
at a time (about 12 to 24 hours after infection) when viral 
DNA synthesis is complete but integration is still ongoing. 
If an appropriate target DNA is added to such an extract 
(either from nuclei or cytoplasm), then integration of the 
viral DNA mediated by preintegration complexes in the 
extract can be readily detected. Suitable detection meth- 
ods include the use of a selectable marker and cloning after 
integration into a suitable vector (81,224), Southern blot- 
ting to reveal integration into a closed circular DNA tar- 
get (206,240,426), and PCR using primers derived from 
The target DNA and the LTR (395,599). These reactions 
are quite efficient. In all cases, detailed analysis of the re- 
action products shows that the in vitro reaction is correct. 
There is a loss of the appropriate number of bases from the 
end of the viral DNA and a duplication of the correct num- 
ber of bases of cellular sequence flanking the provirus. 

Success with reactions from preintegration complexes 
led to the development of model reactions using IN pro- 
tein purified from virions or produced by recombinant tech- 
niques and simple labeled double-stranded oligonucleotides 
as substrates (94,373,378). Although the purified systems 
are much less efficient than reactions using preintegration 
complexes, and their biochemical properties (such as di- 
valent cation requirement) are somewhat different they 
have permitted identification of IN as the source of all nec- 
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essary catalytic activity, as well as in-depth analysis and 
understanding of the integration mechanism. In these re- 
actions, incubation of suitable substrate oligonucleotides 
with IN, followed by analysis by electrophoresis in dena- 
turing polyacrylamide gels, reveals two major classes of 
product: a molecule two bases shorter than the substrate, 
representing the product of the cleavage reaction which re- 
moves the 3 ' dinucleotide and a heterogeneous collection 
of larger molecules resulting from the strand transfer re- 
action in which the 3' end of the substrate oligonucleotide 
is integrated into an internal position on another (target) 
molecule. In the simplest system, substrate and target are 
the same; however, integration into other target molecules 
such as plasmid DNA can also be assayed. This provides 
the basis for some rapid, quantitative assays for integrase 
activity (149,524). 

Using these approaches, the following information has 
been gleaned regarding the integration process. 

First, concordant with the rationale, integration in ex- 
tracts of infected cells is mediated by the preintegration 
complexes; the activity of complexes containing only lin- 
ear DNA molecules implies the linear form of DNA as the 
important intermediate, a conclusion well substantiated 
by subsequent biochemical analysis. At least in the case 
of MLV and HIV, structures purified from soluble cellu- 
lar components by sedimentation or chromatography are 
still active in integration, implying that no soluble cell fac- 
tors are required (65,206). Indeed, active HIV complexes 
containing only DNA and IN can be isolated (208), indi- 
cating that other viral proteins, though often present, may 
also be dispensable. 

Second, the reaction carried out by the MLV complex- 
es is limited to integration into the added target. No side 
reactions, such as circle formation or autointegration, are 
seen (82,240). In the case of ALSV and HIV, however, the 
situation seems to be somewhat more complex. Circular 
products due to integration of the viral DNA into itself as 
well as single LTR circles are found among the reaction 
products in high yield (Fig. 12) (207,426,427), although 
their formation can be suppressed by appropriate reaction 
conditions (427). Presumably, there are specific mecha- 
nisms to block autointegration in vivo, although small 
amounts of one- and two-LTR circles, as well as the cir- 
cular products arising from integration of viral DNA mol- 
ecules into themselves, can be found in infected cells (666). 
Recent evidence suggests that a host cell factor, tightly 
bound to the MLV complex, may suppress autointegration 
in this virus (R. Craigie, personal communication). How 
this important aim is accomplished in other viruses remains 
unclear. In the case of ALSV, the DNA in cytoplasmic 
preintegration complexes is incomplete (427). Perhaps there 
is a block to some event required for completion of DNA 
synthesis (such as the plus-strand jump) that causes a delay 
until the complex is in the nucleus where integration tar- 
gets are abundant. 



Third, most of the 3' ends of the viral DNA in the 
plexes have already been cleaved by IN to remove tl 
minal two bases, while the 5' end of each strand i 
cisely at the site of its initiation, and therefore ha;^ 
modified only by removal of the primer. The ends thu 
the sequence 

5'AATG CA 3' 

3 'AC GTAA5' 

Since the majority of unintegrated forms can ha^ 
structure some time prior to the integration reacts 
formation must not be tightly coupled kinetically 
rest of the integration reaction. However, cleavages 
ends of a molecule are coupled to one another; mut 
in MLV at one att site block cleavage at both (529 

The preintegration complex reaction joins the pre\ 
formed 3 ' end to a 5' end of the target DNA (82,24( 
5 ' ends of the viral DNA are unaltered and remain un 
This result strongly implicated a linear molecule as 
tegration intermediate rather than a two-LTR circu 
termediate, as was previously thought. This conclusn 
confirmed by the suitability of small oligonucleoti 
substrates for integrase. 

Fourth, mutational analysis implies that the cleava 
strand transfer reactions take place at the same acti 
(195,372,414,759), requiring the highly conserved 1 
E motif. While the coincidence of the two differen 
tions at the same site might at first seem contradicto 
underlying mechanism of the two reactions is, in fa 
same (196,759) (Fig. 16). In both cases, the reacti 
volves a direct attack on a phosphate group by a hy 
group, leading to exchange of an internucleotide b( 
the usual cleavage reaction, the OH donor is water 
ever, other molecules, such as glycerol or the 3 ' OH 
same DNA (leading to a cyclic dinucleotide produ< 
also participate (196). In the strand transfer reactit 
OH donor is the 3' end of the newly cleaved DNA 
ing to a direct transesterification. Thus, the integrat 
action is concerted, rather than involving separate 
age and ligation reactions. 

Fifth, there is no obvious requirement for a par 
sequence or structure of the target DNA. Indeed, a> 
above, both incorporation of DNA into chromatir 
and extensive methylation of dC residues (395) — n 
cations associated with reduced transcriptional acti 
can create strong target sequences for integration ii 
Their in vivo effects remain to be tested. 

Finally, the reaction proceeds efficiently in the a 
of ATP or any other ad~~~ energy-generating 
(81,94). This independence is consistent with the c 
breakage-rejoining reaction. Also consistent is t 
servation that IN can catalyze the reverse reacti 
ferred to as disintegration. If provided with a DN 
ecule resembling the integration intermediate, 1 
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A. Cleavage 



fl-O-H 




B. Strand Transfer 



o=p-o- 




FIG. 16. IN-mediated cleavage and strand transfer. Note that the underlying mechanism (OH-mediated 
attack on an internucleotide phosphate bond) is the same for both reactions (196,775) mea ' atea 



separate it into two molecules, resealing the adjacent nick 
in the process (120). 

All of these considerations have led to formulation of 
the pathway shown in Fig. 17. 

1. Following viral DNA synthesis, the core structure con- 
taining linear DNA, and the CA, IN, and possibly RT 
and NC proteins, enters the nucleus. 

2. The 3 ' terminal two bases at either end are removed 
by the cleavage reaction of IN, leaving a 3' OH end. 
This reaction may occur before entry into the nucleus. 



3. The strand transfer reaction simultaneously joins the 
two ends of the viral DNA to cellular DNA about half 
a turn of the helix apart, with the precise spacing de- 
termined by the geometry of the IN multimer. 

4. A cellular DNA repair system fills in the resulting gap 
in the molecule, displacing the two mismatched bases 
at the 5' end of the provirus and ligating the remain- 
ing ends. This gap repair of the initial staggered joints 
generates the characteristic duplication of cell DNA 
flanking the provirus. 
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Adeno-Associated Virus Vector Integration Junctions 
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Vectors derived from adeno-associated virus (AAV) have the potential to stably transduce mammalian cells 
by integrating into host chromosomes. Despite active research on the use of AAV vectors for gene therapy, the 
structure of integrated vector proviruses has not previously been analyzed at the DNA sequence level. Studies 
on the integration of wild-type AAV have identified a common site-specific integration locus on human 
chromosome 19; however, most AAV vectors do not appear to integrate at this locus. To improve our under- 
standing of AAV vector integration, we analyzed the DNA sequences of several integrated vector proviruses. 
HeLa cells were transduced with an AAV shuttle vector, and integrated proviruses containing flanking human 
DNA were recovered as bacterial plasmids for further analysis. We found that AAV vectors integrated as 
single-copy proviruses at random chromosomal locations and that the flanking HeLa DNA at integration sites 
was not homologous to AAV or the site-specific integration locus of wild-type AAV. Recombination junctions 
were scattered throughout the vector terminal repeats with no apparent site specificity. None of the integrated 
vectors were fully intact. Vector proviruses with nearly intact terminal repeats were excised and amplified after 
infection with wild-type AAV and adenovirus. Our results suggest that AAV vectors integrate by nonhomolo- 
gous recombination after partial degradation of entering vector genomes. These findings have important 
implications for the mechanism of AAV vector integration and the use of these vectors in human gene therapy. 



Adeno-associated virus (AAV) is a 4.7-kb single-stranded 
DNA virus that has been developed as a gene therapy vector 
(31). Only the terminal repeat (TR) sequences are required in 
cis for replication and packaging, allowing a complete replace- 
ment of viral coding sequences with foreign DNA in vectors. A 
major advantage of AAV vectors is their ability to stably trans- 
duce cells by integration into host chromosomes. Although 
integration is crucial for many gene therapy applications, the 
mechanism of AAV vector integration is poorly understood, 
and the structure of integrated proviruses has not been deter- 
mined at the DNA sequence level. 

Southern analysis of DNA from transduced cells maintained 
under selection for the vector transgene has usually demon- 
strated the presence of integrated AAV vector proviruses (19, 
27, 30, 36, 42). However, there have also been reports of 
episomal vector molecules, especially with rep'*' vectors (30) or 
the absence of selectable markers (1, 4). Some reports of 
vector integration noted a predominance of concatemeric pro- 
viruses (30, 36), while others did not (11, 27, 33, 44). The 
reasons for these variable results are unclear and include dif- 
ferences in transduction protocols as well as possible effects of 
contaminating wild-type AAV functions. 

Two-thirds of integrated wild-type AAV proviruses are 
found at a specific human chromosome 19 site, 19ql3-qter (24, 
25, 38). While this feature could prove useful in some gene 
therapy applications, AAV vectors have not been found to 
integrate at this same locus (33, 44). Site-specific AAV inte- 
gration appears to be mediated by the viral Rep protein (16, 
28), so the absence of the rep gene in most vectors can explain 
the lack of site-specific integration. The presence of random- 
sized junction fragments detected by Southern analysis of in- 
tegrated proviruses suggests that vector integration sites are 
random (27, 30, 33, 44). However, it is possible that vector 
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integration occurs at scattered locations within a common lo- 
cus different from 19ql3-qter, just as wild-type AAV integrants 
are found scattered throughout the chromosome 19 site-spe- 
cific integration locus (23, 38). 

To better understand the process of AAV vector integration, 
we produced an AAV shuttle vector that allowed us to recover 
integrated vector proviruses along with flanking human DNA 
as bacterial plasmids. Several integration junctions from inde- 
pendent, transduced HeLa cell clones were sequenced and 
compared. Our results show that vector integration occurs at 
random chromosomal sites and that none of the vector provi- 
ruses integrated as intact, full-length genomes. 

MATERIALS AND METHODS 

Cell culture. Human HeLa (39) and 293 cells (17) were cultured in Dulbccco's 
modified Eagle medium wilh 10% hcat-in activated (56°C for 30 min) fetal bovine 
serum (HyClonc, Logan, Utah), amphotericin (1.25 |tg/ml), penicillin (100 U/ 
ml), and streptomycin (100 u.g/ml) at 37*C in a 10% CO a atmosphere. Titers of 
vector stocks were determined on HeLa cells by selecting for G418 (GIBCO- 
BRL, Grand Island, N.Y.) resistance as described previously (33) except that the 
G418 concentration was 1 mg of active compound per ml. Transduction of HeLa 
cells by AAV-SNori was carried out as when titering for G418 resistance except 
that resistant colonics were picked and expanded to 2 X 10 7 cells for isolation of 
genomic DNA. The multiplicities of infection were 0.17 vector particles per cell 
for clones 2 and 6; 1.7 for clones 1, 3, 4, 10, and 1 1; and 170 for clones 5, 7 to 9, 
and 12 to 14 (see Table 1). 

Preparation of virus stocks. AAV-SNori vector stocks were prepared as fol- 
lows. 293 cells were plated at a density of 8 x 10* cells/dish in 12 dishes (15-cm 
diameter). The next day, each dish was infected with 1.2 x 10* PFU of adeno- 
virus type 5 (ATCC VR-5; American Type Culture Collection, Rockville, Md.) 
and 2 hours later cotransfected with 8 jtg of pASNori2 and 32 u-g of p AAV/Ad 
(36) by the calcium phosphate method (35). After 3 days, the cells and medium 
were harvested and combined, subjected to three cycles of frcczc-thaw lysis in a 
dry ice-ethanol bath, clarified by centrifugation at 5,800 X g (5,500 rpm) in a 
Sorvall HS4 rotor for 30 min at 4°C digested with micrococcal nuclease (68 
U/ml; Pharmacia, Piscataway, NJ.) at 37*C for 1 h, treated with trypsin (50 
ng/ml) at 37°C for 30 min, and centrifuged through 40% sucrose in phosphate- 
buffered saline in a Beckman SW28 rotor at 27,000 rpm for 16 h at 4*C. The 
pellets were resuspended in 8 ml of a 0.51-g/ml solution of CsCI and passed twice 
through a 22-gauge needle. The suspension was centrifuged in a Beckman SW41 
rotor at 37,000 rpm for 20 h at 4°C. The region of the gradient containing AAV 
virions was collected, diaryzed against Dulbecco's modified Eagle medium 
through a 50,000-molecular-weight-cutoff membrane (Spectrum, Houston, Tex.), 
and concentrated by centrifugation in Ccntricon 100 filters (Amicon, Inc., Bev- 
erly, Mass.). Adenovirus was inactivated by treatment at 56*C for 1 h. The final 
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stock contained 6.8 x 10" genomes per »i1 as determined by Southern analysis 
(33). The level of contaminating wild-type AAV was <2.3 x 10 4 genomes per uJ 
as determined by Southern analysts using AAV coding sequences as a probe. 

Plasmid construct! uns. To construct pASNori2 (sec Fig. 1), a Ba/7iHl-£jp3I 
fragment of pSV2nco (41) containing an SsphBstUOll origin fragment from 
pACYC184 (7) in the Bj/B1 site (end filled with the KJenow fragment of DNA 
polymerase 1) downstream of the neo (neomycin phosphotransferase) gene was 
inserted in the BglM sites of the AAV vector backbone of pTR (34) after 
attachment of Br/mHI linkers to the pSV2nco Espll site. The pACYC184 frag- 
ment contains the p!5A bacterial plasmid origin (10), with the direction of 
leading-strand DNA synthesis opposite that of neo gene transcription. pASNori2 
also contains the pMBI plasmid origin (5) from pBR322 (6). pASNoril, a 
deletion derivative of pASNori2 lacking the pMBI origin, was constructed by end 
filling and circularizing a Bsa\-Bst\ 1071 fragment of pASNori2. 

Isolation of HeLa genomic DNA- Two confluent 10-cm-diametcr dishes of each 
HcLa clone (2 X 10 "cells) were lyscd in genomic DNA lysis buffer (10 mM Tris 
|pH 8], 1 mM EDTA, 200 mM NaCl, 0.5% sodium dodccyl sulfate, 200 *Lg of 
proteinase K per ml) at 37*C overnight. The samples were then extracted with 
phenol and chloroform, extracted with butanol, and precipitated with 2 volumes 
of cthanol overnight at -20°C. The DNA was pelleted at 6,000 rpm in a Sorvall 
HS-4 rotor at 4°C for 25 min, washed with 70% cthanol, and air dried briefly. The 
DNA was resuspended in 500 u.1 of TE (10 mM Tris [pH 8], 1 mM EDTA), 
digested with 10 u>g of RNasc A (Sigma, St. Louis, Mo.) at 37*C for 3 h, extracted 
with phenol and chloroform, and precipitated with 50 u.1 of 3 M sodium acetate 
and 1 ml of cthanol at -20°C. Each pellet was resuspended in 200 u.1 of TE. 

Provlrus recovery In bacteria. To recover integrated AAV-SNori proviruses, 
10 jig of HcLa genomic DNA was treated with 20 U of calf intestinal phospha- 
tase (Boehringcr Mannheim, Indianapolis, Ind.) to prevent ligation of free ends 
in the sample, heat inactivated at 65'C for 1 h, extracted with phenol and 
chloroform, and precipitated with cthanol. The resuspended DNA was digested 
with 20 U of EcoRI, which does not cut in the SNori vector, at 3TC for 4 h, heat 
inactivated at 65°C for 30 min, extracted with phenol and chloroform, and 
precipitated with cthanol. The resulting DNA fragments were resuspended and 
circularized with 200 U of T4 DNA ligasc in 400 uJ at M'C overnight. The DNA 
was precipitated and one-fifth of the sample (approximately 2 p.g) was electro- 
poratcd into supcrcompctcnt Escherichia colt XLlBlue MRF' cells (Stratagenc, 
La Jolla, Calif.). 

Provirus excision and amplification. HcLa cells and each of the nine HeLa 
clones from which plasmid had been recovered (see Table 1) were plated at 10* 
cells per 35-mm-diamctcr dish (Corning, Corning, N.Y.). The next day, cells were 
infected with wild-type AAV type 2 at 10 replication-competent particles/cell and 
adenovirus type 5 at 10 PFU/cell or were left uninfected as indicated in Fig. 6. 
Forty-four hours after infection, episomal DNA was isolated by the method of 
Hirt (21), with an additional proteinase K digestion, extraction with phenol and 
chloroform, and precipitation with ethanol. One-fourth of the episomal DNA 
from each dish was separated by alkaline agarose gel electrophoresis and trans- 
ferred to Hybond-N+ membranes (Amersham, Arlington Heights, 111.) accord- 
ing to standard procedures (35). Standards were prepared by BsmI digest of 
pASNori2 to produce a 2.6-kb fragment. DNA was detected by Southern analysis 
using a neo gene probe. 

DNA techniques. Restriction enzymes, T4 DNA ligasc, and DNA polymerases 
were from New England BioLabs, Beverly, Mass. Proteinase K was from Boehr- 
ingcr Mannheim. Enzyme reactions were performed by using the manufacturer's 
recommended conditions. DNA manipulation and Southern blot analysis were 
performed by standard procedures (35). Southern blots were quantitated using a 
Phosphorlmagcr 400S (Molecular Dynamics, Sunnyvale, Calif.). Plasmids were 
prepared by using Qiagcn (Chatsworth, Calif.) columns. Dye terminator cycle 
sequencing was carried out with an AmpltTaq FS polymerase sequencing kit 
(Perkin-Elmcr, Foster City, Calif.) and analyzed on an Applied Biosystems Inc. 
(Foster City, Calif.) sequencer. Oligonucleotides for sequencing are PI 
• (5 ' -dTAC AAATAAAGCAATAGC ATCAC-3 ' ) and P2 (5'-dCCTCTOACAC 
ATGCAGCTC-3 ' ). Sequences were analyzed by the Wisconsin version of the 
Genetics Computer Group program, using FASTA with default parameters and 
searching against GcnBank/EMBL. HcLa flanking sequences were compared 
with the human Ahi repetitive sequence huma!urp7 (GcnBank accession no. 
M57427), using FASTA with default parameters; a sequence with more than 
70% identity over at least 100 nucleotides was considered a match. The probe 
used for detecting vector genomes by Southern analysis contained internal AAV- 
SNori sequences including the neo gene. The probe used for detecting site- 
specific integration was a 1.7-kb£coRl-BamHI fragment from the chromosome 
19 integration locus cloned in pRE2 (38). 

RESULTS 

Transduction of HcLa cells by AAV-SNori. We designed an 
AAV shuttle vector (AAV-SNori) that could be recovered as a 
bacterial plasmid along with flanking human DNA after inte- 
gration into host chromosomes. The recovered proviral plas- 
mid could then be propagated in bacteria, and the junction 
fragments could be sequenced. The AAV-SNori vector con- 
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FIG. 1. AAV-SNori shuttle vector. The structure of the packaged genome of 
the AAV-SNori vector is shown; features include the neomycin phosphotrans- 
ferase gene (Neo), the SV40 and Tn5 promoters with their transcription start 
sites (arrows), the pl5A replication origin, the AAV TRs, and the polyadcnyl- 
ation site (pA). An expanded diagram of the TR structure is shown to indicate 
the repeat domains. PI and P2 represent binding sites for sequencing primers. 
The positions of Bsm\ restriction sites arc shown. 



tains the neo gene under the control of both the simian virus 40 
(SV40) early promoter and the transposon 5 (To5) promoter 
for expression in human or bacterial cells, as well as the pl5A 
bacterial replication origin packaged between AAV2 TRs (Fig. 
1). These elements can support replication and confer kana- 
mycin resistance in E. coli. The pl5A origin was chosen be- 
cause it replicates at the relatively low rate of 12 to 15 copies 
per E. coli chromosome (10), and we have found that plasmids 
containing the AAV TRs are more stable if they replicate at 
lower copy numbers (data not shown). 

HeLa cervical carcinoma cells were transduced with purified 
AAV-SNori. One day after infection, the cells were treated 
with trypsin, and dilutions were plated in culture medium with- 
out selection. Selection was begun 24 h later with the antibiotic 
G418. Approximately 3 x 10 4 intact vector particles were re- 
quired to produce a single, stable, G418-resistant colony. As all 
of these colonies contain integrated proviruses (see below), 
this represents the minimal vector integration frequency. The 
transient transduction rate was higher, as many of the colonies 
visible at earlier time points did not survive continued selection 
in G418, perhaps due to episomal vector gene expression. 
Fourteen G418-resistant colonies were isolated and expanded 
to approximately 2 x 10 7 cells under selection. Genomic DNA 
was isolated from each of 14 clones, digested with Bsm\ or 
£coRI, and analyzed by Southern blots probed with the neo 
gene (Fig. 2). BsmI digests inside each of the TRs, so intact, 
integrated vector genomes should contain a 2.6-kb BsmI frag- 
ment (Fig. 1). Figure 2A shows that 10 of the 14 clones had a 
fragment of the expected size. Clones 2, 6, 8, and 9 had dele- 
tions or rearrangements. The high-molecular-weight vector 
band in clone 8 DNA is faint, presumably due to a partial 
deletion of vector sequences. Clone 10 had a second, smaller 
fragment containing at least a portion of the neo gene. Quan- 
titative analysis of the blots was consistent with single-copy 
integrated proviruses for all the clones except 8, 12, and 14, 
with the latter two clones containing four to five copies/cell. 

Digestion of the transduced HeLa clones with EcoRl pro- 
duced bands of various sizes ranging from 4 to 23 kb (Fig. 2B). 
As the AAV-SNori vector genome does not contain an EcoRI 
site, this finding is consistent with random integration. Al- 
though two BsmI vector fragments were present in the DNA 
from clone 10 (Fig. 2A), only a single EcoKL fragment was 
detected, suggesting that both BsmI fragments were located at 
a common integration site. 

Provirus recovery as plasmids. Integrated vector proviruses 
were recovered from the transduced HeLa clones by digestion 
with £coRI, circularization with DNA ligase, and transfer into 
E. coli. In this strategy, the £a>RI fragments containing intact 



Vol. 71, 1997 



AAV VECTOR INTEGRATION JUNCTIONS 8431 



A Bsml digest 




B EcoRI digest 

2 3 4 5 6 7 8 9 10 11 12 13 14 

23 kt>- fftHrtt ■Mttrriiarga i ii-ii ritn ^MSDbmm 



4kb . . - , # - 

FIG. 2. Southern blots of transduced HeLa clones. Ten micrograms of 
genomic DNA from each HeLa clone (lanes 1 through 14) was digested with 
either Bsml (A) or EcoRI (B), clcctrophoresed through a 1.2% agarose gel, 
transferred to a nylon membrane, and probed with internal vector sequences to 
detect integrated AAV-SNori proviruses. Lane HeLa, 10 u,g of nontransduced 
HeLa DNA; lane HeLa + pASNori2, 10 fig of nontransduced HeLa DNA with 
10 pg of pASNori2 plasmid DNA. This image was prepared by using Adobe 
Photoshop (Mountain View, Calif.) software. 



vector proviruses also contain flanking chromosomal DNA, 
and their expected sizes can be predicted from the Southern 
analysis in Fig. 2B. Ligation at low DNA concentration en- 
hances circularization of individual EcoRI fragments. The cir- 
cularized fragments containing vector proviruses are then 
propagated as bacterial plasm ids after electropo ration of bac- 
teria and selection with kanamycin. We recovered proviral 
plasmids from 9 of 14 HeLa transductants. Plasmids could not 
be recovered from the five other HeLa clones, even after 
repeated transformations (Table 1). This could be due to de- 
letion or rearrangement of a region essential for plasmid func- 
tion in bacteria but not required for G418 resistance in mam- 
malian cells, such as the pl5A origin or Tn5 promoter. Four of 
the five HeLa clones containing proviruses that could not be 
recovered had an altered internal vector Bsml fragment by 
Southern analysis (Fig. 2A), consistent with vector rearrange- 
ments. 

Recovered plasmids were isolated from several of the kana- 
mycin-resistant bacterial colonies and digested with EcoRI 
(Table 1). Most of these contained a single EcoRI site and 
were the size expected from Southern analysis (Fig. 2B). 
Transformation efficiencies varied considerably and did not 
appear to correlate with plasmid size. Only those plasmids with 
a single EcoRI fragment of the predicted size were considered 
correct. Of the plasmids that did not appear correct, 12 of 13 
contained one or more extra EcoRI fragments in addition to 
the predicted fragment, which were presumably acquired by 
intermolecular ligation. Multiple proviral EcoRI fragments re- 
covered from each cell line had the same restriction pattern, 
suggesting that major DNA rearrangements had not occurred 



TABLE 1. Recovered provirus plasmids* 



HeLa 
clone 


Size of Bsml 
fragment 
(kb) 


Sim? of FrnRI 

fragment (kb) 


Transformation 
efficiency/ p.g of 
DNA 


No. of 
plasmids 
analyzed 


No. of 
plasmids 
correct 


j 


2.6 


19 


<0.17 


o 


o 


2 


2.3 


g 


<0.17 


o 


0 


3 


2.6 


16 


2.5 


5 


5 


4 


2.6 


4.7 


101 


5 


5 


5 


2.6 


5.5 


68 


5 


4 


6 


2.3 


7 


<0.17 


0 


0 


7 


2.6 


17 


18 


5 


3 


8 


6.5 


11 


<0.17 


0 


0 


9 


3.3 


16 


<0.12 


0 


0 


10 


2.6, 1.9 


9 


2 


4 


2 


11 


2.6 


20 


9.5 


5 


5 


12 


2.6 


18 


4 


8 


1 


13 


2.6 


4 


1.5 


3 


3 


14 


2.6 


5.5 


4.5 


5 


4 



* Integrated proviruses were recovered from transduced HeLa clones and 
transformed into bacteria. The sizes of fragments from Southern blots (Fig. 2), 
the transformation efficiency, and the number of plasmids that had a single 
EcoRI fragment (no. of plasmids correct) are listed for each HeLa clone. 



during the transfer to bacteria. Further restriction digests con- 
firmed that each recovered plasmid contained a single vector 
provirus copy (data not shown). 

Because the TR sequences might be unstable during provi- 
rus recovery and replication in bacteria, we performed control 
transformations to ensure that our assay could recover plas- 
mids with intact TRs. Plasmid pASNoril contains the entire 
AAV-SNori vector genome including both TRs and depends 
on the pl5A replication origin in the vector sequences for 
replication, making it similar in structure to the recovered 
plasmids. pASNoril was linearized with EcoRI, gel purified, 
mixed with nontransduced, EcoRI-digested HeLa DNA, and 
subjected to the same recovery procedure as integrated provi- 
ruses. Of 10 recovered plasmids, 8 contained the expected 
unique EcoRI site. Two recovered plasmids contained extra 
EcoRI fragments in addition to the expected pASNoril frag- 
ment. The restriction digest patterns of all eight correct plas- 
mids were identical to the original pASNoril plasmid after 
digestion with four restriction enzymes that have sites in the 
TRs, confirming that the TRs remained intact during our re- 
covery procedure (data not shown). 

Sequence analysis of integration junctions and flanking 
genomic DNA. One example of each correct recovered plasmid 
was sequenced by using primers PI and P2 (Fig. 1). These 
primers bind to locations in the vector genome about 70 nu- 
cleotides inside each of the TRs. The sequenced regions begin 
at internal vector sequences and then proceed outward 
through the TR region, the integration junction site, and flank- 
ing human DNA. Of the nine plasmids analyzed, flanking hu- 
man sequence information of 400 or more nucleotides was 
obtained from 14 of the 18 possible integration junctions. 
Three other junctions (12L, 12R, and 14L) contained nearly 
complete TRs that stalled the sequencing polymerase and one 
(10R) contained TR sequences joined to TnJ sequences (see 
below). The sequences around each junction site are shown in 
Fig. 3 and are designated as from the left TR (L) or right TR 
(R) of each recovered provirus. Figure 3 A displays junction 
sequences that could be aligned in the "flip" orientation; Fig. 
3B shows a right junction sequence that was in the "flop" 
orientation. In the flop orientation, the B and C regions are 
inverted compared to the flip orientation (D-A'-B-C-A instead 
of D-A'-C-B-A) (29). Figure 3C shows a left junction sequence 
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A . RBS iQQ 



right TR 

13R cATTCTAGTTGTGGTTTGATCGATCTGAGGWVCCaccttcaaggaaagtattgaattcaggaccctgattc 



120 



A' 



RBS 



GCCACTCCCTd rTGCGCGCTCGCTCGCTCAl 



>GGCCGCCCGGGCAAAGCCCQGGCGTCGGGCGACCTT 



TR 

10R GCCACTCCCaccggaattgccagctggggcgccctctggtaaggtcgggaacccctncanngtaanntgga 

101* GC<^CTCCCTCTCTGCGCGCTCtgggaccctgtgcataggcaggattgctgcctcagaaccaggtcacctt 

11R GCCACTCCCTCTCTGCGCGCTCGCTCaaatccccacacctggagcactgggctcagggtaaaanagaccct 

7L GCCACTCCCTCTCTGCGCGCTCGCTCGCTCACrGAGtgccgcctagggcactgttcCcagctctgctggtc 

5L GCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCactccttctggagtgcagtggtgtgatcatagc 

4L GCCACTCCCTCTCTGCGCGCTCXJCTCGCTCACTGAGGCCcagaactttgggaggaggatcacttaaggcca 

11L GCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGggcgaccaacctggagctgctcacagagagg 



80 C B , A " 
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B 



5R ccCGGGCAAAGCCtgtatcacagcaagaacctgtatcttggaaaaaaataaaatcagaggtagtcagtcca 

4R ccCGGGCAAAGCCCQGGCGTCGGGCGACgcccagcccCtccctgagtgttgcatgctactgagcccttctg 

7R CCOSGGCAAA^KXrCGGGCGTCGGGCGACCTTcaattccacactctcagctcacactgcCgaaaaatgcttg 

3L CCCGGGCAAAGCCCGGGCGTCGGGCGACCTTTGGcacagcccagtgaggccagagttggtgagatgtctga 



A* i B , C J A , BBS. 



TR flop ACTGAGGC CGGGCGACCAAAGCTCGCCCGACGCCCGGGCTTTGC ' 



14R ACTGAGGCCGGGCGACCAAAGGTCacagagaattgagtcaagtgatcctcttgcctccacaacatccagct: 
C 187 117 

left internal { D t A' 

left TR GGTTTGTCCAAACTCATCAATGTATCTTATCATG 

13L GGTI^TGTCCAAACTCATCAATOTATCTTATCATGTCTacatatatatactttaga 

FIG. 3. AAV vector integration junctions. Portions of AAV vector sequences are shown in uppercase. Up to 62 nucleotides of flanking sequences arc shown for 
each junction in lowercase and arc aligned beneath the intact vector sequence (TR). Nucleotide positions are numbered beginning at the end of the intact TR. The 
sequences arc designated as from the left (L) or right (R) of each numbered provirus clone. The Rep binding site (RBS) (9, 34) is boxed. Subdomains of the TRs (A 
B, C, A', and D) arc indicated above the sequences. The flanking sequence is HeLa DNA, except for junction 10R, where it is Tn5 DNA (sec text). (A) Sequences 
aligned to the right TR in the flip orientation and internal vector DNA. (B) Sequence aligned to the flop orientation of the TR. (C) Sequence aligned to the left internal 
portion of the vector. 



with the recombination site internal to the TR. Of the six 
junctions containing part of the B or C region in which the 
orientation could be determined, five were in the flip orienta- 
tion. 

The sequence data show that no two proviruses had the 
same junction site. Junctions were dispersed throughout the 
TRs and internal vector sequences. We could not identify a 
common sequence motif in vector or chromosomal DNA at the 
junction sites. None of the flanking sequences were similar to 
each other, and there were no matches with any gene in the 
GenBank or EMBL database. The only known sequence ele- 
ment found was a single Alu repeat in the flanking human 
DNA of junction 4R. 

Comparison of the flanking sequences with two sequences 
from the chromosome 19 integration locus, AAVS1 (GenBank 



accession no. S51329) and pRE2 (provided by R. J. Samulski), 
produced no matches, indicating that the clones did not inte- 
grate into the site-specific integration locus of wild-type AAV2. 
This observation was confirmed by probing Southern blots of 
HeLa clone DNAs with a portion of the site-specific integra- 
tion locus, which showed no rearrangements at this locus in any 
of the 14 transductants analyzed (data not shown). In addition, 
analysis of the flanking sequences revealed no binding sites for 
the AAV Rep protein (9, 34), in contrast to the chromosomal 
site-specific integration locus (16). 

Two of the junctions (4L and 5R) had structures that were 
the result of rearrangements. A schematic representation of 
these proviruses is shown in Fig. 4. The rearrangements in 
sequences 4L and 5R were in the regions just inside each TR 
where a 44-nucleotide region is repeated in the left and right 
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FIG. 4. Structures of rearranged sequences. The 4L and 5R provirus junction 
sequences arc diagrammed to show where internal vector sequences rccombincd. 
The positions of left and right internal vector sequences, the common SV40 
polyadenylation site found in cither end (SV pA), the TR domains (D, A', and 
C), and flanking human DNA arc shown. 



portions of the vector. This region contains the SV40 polyad- 
enylation signal and was duplicated as a result of vector con- 
struction. In the case of junction 4L, sequencing from the left 
PI primer (Fig. 1) read a portion of left internal sequences, the 
common SV40 sequence, and then right internal vector se- 
quences followed by a portion of the right TR and flanking 
chromosomal DNA. A similar finding was observed in junction 
5R reading from the right P2 primer. Apparently homologous 
recombination occurred between these sites and produced an 
inverted internal portion of the vector. Recombination could 
have happened during the production of the AAV-SNori 
stocks or the transduction of HeLa cells. 

Junction (10R) was not joined to human DNA. This junction 
contained the A' region of the right TR joined to a second 
copy of vector Tn5 DNA. The sequence around the junction 
site is shown in Fig. 3. This explains the two /leo-hybridizing 
bands detected in Fig. 2A for HeLa clone 10. Flanking HeLa 
DNA sequence was not obtained for this junction and must lie 
beyond the region sequenced. 

When the TR is drawn in its predicted secondary structure, 
the recombination sites of the integrated provi ruses are scat- 
tered throughout the repeat (Fig. 5). Except for some potential 
clustering of recombination events between the C and A' re- 
peat domains, our data set does not suggest that secondary 
structure in the TRs directs a specific integration process. 
Additional junctions will need to be sequenced to determine if 
the four junctions observed between C and A' represent a 
recombination hotspot. 

Flanking DNA sequence was not obtained for the remaining 



3 of the 18 possible integration junctions recovered (12L, 12R, 
and 14L). The sequences obtained for these junctions ended 
abruptly in the A' region of the TR, where the sequencing 
polymerase apparently stalled. Based on our sequencing of 
intact AAV TRs, this is consistently observed when the sec- 
ondary structure of the inverted repeat remains intact, and the 
polymerase stalls at the first base that can pair in the secondary 
structure (data not shown). If we assume that this also oc- 
curred while we were sequencing these three junctions, then 
we can place the end of the vector provirus DNA at the last 
base of the predicted secondary structure that would have 
stalled the sequencing polymerase, in the corresponding posi- 
tions of the A domain. These junctions are indicated with 
asterisks in Fig. 5. 

Provirus rescue by wild-type infection. Integrated wild-type 
AAV proviruses can be excised and amplified by infection with 
adenovirus (3, 8, 18, 26), and a similar rescue of vector provi- 
ruses can occur by infection with wild-type AAV and adeno- 
virus (19, 30, 36, 42). We screened each of the nine transduced 
HeLa clones that contained proviruses recovered in bacteria to 
see if their TR structures would correlate with the potential for 
excision and amplification. Only clones 12 and 14 contained 
proviruses that could be rescued by coinfection with wild-type 
AAV and adenovirus, and none of the clones could be rescued 
by adenovirus alone, as expected if no rep gene was present in 
the cells. Figure 6 shows representative results from this ex- 
periment for clones 12, 13, and 14. The two rescuable provi- 
ruses had the most intact TRs of all the proviruses that we 
analyzed (Fig. 5), and the nearly complete repeats on both 
ends of clone 12 correlated with more efficient rescue than 
those of clone 14. 



DISCUSSION 

We have developed a shuttle vector system to study the 
structure of integrated AAV vector proviruses and analyze 
their flanking DNA. Fourteen independent, G418-resistant 
HeLa clones were isolated after transduction with an AAV 
vector containing the neo gene, and integrated vector provi- 
ruses were recovered as bacterial plasmids from nine of these 
lines. Sequence analysis of the proviral junction sites showed 
that each integration event occurred at a different chromo- 
somal site, and none of these sites were at the chromosome 19 
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FIG. 5. Junction sites and TR secondary structure. Junction recombination sites within the TRs of recovered proviruses are shown in relation to the secondary 
structure of the TR. All sequences were adapted to be displayed in the flip orientation. Junctions 12L, 12R, and 14L are presumed junction sites because the actual 
sequences ended in the corresponding positions in the A' region, suggesting that the TR was intact up to that point. The positions of the Rep binding site (RBS) (9, 
34) and terminal resolution site (TRS) (40) arc shown. 
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FIG. 6. Provirus rescue. Transduced He La clones were infected with wild- 
type AAV2 (AAV), and/or adenovirus (Ad) or left uninfected as indicated. 
Episomal DNA was isolated from the cells 44 h later, elect rophorcscd through an 
alkaline agarose gel, and subjected to Southern analysis using a nto gene probe. 
The positions of the monomeT (m) and dtmer replicative (d) vector forms arc 
indicated. The standards arc 400 and 40 pg of the 2.6-kb Bsm\ fragment of 
pASNori2. The results for HcLa clones 12 to 14 arc shown (14-h exposure), with 
HeLa clone 12 also shown at a shorter exposure (1 h). This image was prepared 
by using Adobe Photoshop software. 



site-specific integration locus of wild-type AAV. Each recom- 
bination event was at a different position in the vector genome, 
and no fully intact vector proviruses were recovered. There was 
no sequence homology between the vector and flanking human 
DNA and none between the different flanking sequences re- 
covered. These results demonstrate that AAV vector integra- 
tion occurs at random chromosomal locations and that each 
integrated provirus contains variable amounts of the complete 
vector genome. 

The recovered proviruses were joined to human DNA se- 
quences at nucleotide positions throughout the vector TRs and 
at internal vector sequences in one case. Because our system 
required the presence of functional promoters, nto gene, and 
bacterial origin, the ends of the vector were the only portions 
that could have been disrupted by integration and still pro- 
duced a functional provirus capable of being recovered. South- 
ern analysis of the proviruses that could not be recovered 
showed that four of five contained rearrangements within in- 
terna) vector sequences, suggesting that the junctions occurred 
inside the Bsm\ sites (Fig. 2A). While our results are consistent 
with random recombination throughout the vector genome, we 
cannot exclude a partial preference for TR recombination 
sites, since more junctions should have been observed in the 
approximately 170 nonessential internal vector nucleotides if a 
completely random representation had been recovered. 

Based on Southern analysis and restriction mapping of re- 
covered plasmids, 11 of 14 HeLa transductants contained one 
integrated vector genome per cell, and clone 10 contained a 
partially duplicated vector genome. Two clones (12 and 14) 
had a stronger hybridization signal consistent with four to five 
vector copies per cell. In these clones, the vector signal was 
present in unique bands after digestion outside of vector DNA 
with EeoRI, and their recovered plasmids contained one copy 
of the vector provirus within the £c<?RI fragment. This finding 
suggests that amplification of the provirus and flanking DNA 
contained in the EcdBl fragment occurred after integration of 
the vector, perhaps during the selection process in G418. It is 
possible that proviral sequences were responsible for this am- 
plification process, as the AAV TR has been shown to function 
as a replication origin in carcinogen-treated mammalian cells 



(46), and these two proviruses contained the most complete 
TR structures of all those recovered. 

Both concatemeric and single -copy vector proviruses have 
been described in prior studies of vector integration based on 
Southern analysis of transduced, cultured cells (11, 27, 30, 33, 
36, 44). More recently, animal studies have demonstrated the 
presence of concatemeric vector sequences in high-molecular- 
weight DNA by PCR after in vivo vector administration (14, 
20, 45). The conditions favoring concatemeric versus single- 
copy integration are unknown and could include cell-type- 
specific functions, multiplicity of infection, contaminating rep* 
helper virus particles, and differences in selection conditions. 
None of the G418-resistant HeLa cell transductants that we 
analyzed contained concatemeric vector proviruses, over a 
range of infection multiplicities of 0.17 to 170 vector particles/ 
cell, with <0.3% wild-type AAV contamination (wild-type ge- 
nomes per vector genome). Our results are in agreement with 
previous studies suggesting that concatemeric vector provi- 
ruses are preferentially rescued by coinfection with wild-type 
AAV and adenovirus (30, 36), as most of the integrants that we 
analyzed could not be rescued. Although the two transductants 
that could be rescued were not present as concatemers joined 
at the TRs, they were present at multiple copies per cell. These 
proviruses also had the smallest deletions in their TRs, with at 
least one repeat containing a complete copy of each repeat 
domain sequence. This proviral structure would be expected to 
regenerate a fully intact TR by gene conversion as previously 
shown for wild-type AAV (37). 

There were both differences and similarities between vector 
integration in our system and integration by wild-type AAV2. 
Unlike the random vector integration that we observed, two- 
thirds of wild-type proviruses studied in latently infected hu- 
man cells were found in the site-specific integration locus on 
chromosome 19 (25, 38). Although these wild-type integration 
events occurred at a common locus, the junctions were at 
different nucleotide positions within the locus and at different 
positions in the AAV TRs (23, 38). This heterogeneity in 
junction sites within the TRs is similar to our findings with 
vector integration. Site-specific integration into chromosome 
19 requires the presence of a Rep binding site in the chromo- 
somal target DNA (16, 28), and no Rep binding sites were 
identified in the human flanking sequences that we recovered. 
This finding suggests that vector integration is not mediated by 
the Rep protein, including Rep protein molecules that might 
be contained in the entering vector particle. Presumably, either 
Rep expression in the transduced cell is necessary for site- 
specific integration or wild-type virions contain functional Rep 
molecules that are absent from vector virions. A related ob- 
servation may be that integrated wild-type proviruses are often 
found as concatemers (8, 22, 26), which we did not observe for 
vector proviruses. This observation supports the hypothesis 
that site-specific integration involves Rep-dependent replica- 
tion of the chromosome and virus during integration (28, 43), 
while Rep-independent vector integration occurs by another, 
nonreplicative mechanism. 

Our data suggest that AAV vector integration is a nonho- 
mologous recombination reaction. While no clear recombina- 
tion hotspots were identified in the vector genome, there may 
have been a clustering of junctions between the C and A' 
regions of the TR, suggesting a possible role for Holliday-like 
recombination intermediates in the reaction. Although the ex- 
act mechanism has not been determined, the recombination 
event is presumably mediated by cellular proteins. Cleavage of 
the chromosomal preintegration site may be due to host nucle- 
ases or DNA damage. The requirement for chromosomal 
breaks could in part explain why DNA-damaging agents in- 
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crease transduction by AAV vectors (2, 32). The lack of spe- 
cific vector recombination enzymes also suggests why vector 
integration is such an inefficient process, requiring hundreds to 
thousands of vector particles per integration event (15, 19, 33, 
36). One explanation for the consistent recovery of incomplete 
provinces is that the entering, linear vector genomes are par- 
tially degraded before the integration event. Despite our de- 
termination of the structure of integrated proviruscs, key steps 
in the recombination process have yet to be elucidated, includ- 
ing the role of vector second-strand synthesis, which is also 
associated with increased transduction rates (12, 13), and 
whether the recombination reaction involves single-stranded 
or double-stranded vector molecules. The nature of the chro- 
mosomal preintegration site is completely unknown, and there 
could be large deletions or even interchromosomal crossover 
events associated with vector integration. 

The uncertain nature of the AAV vector integration process 
has important consequences for gene therapy. In contrast to 
retroviral vectors, which integrate in a precise and predictable 
reaction mediated by the viral integrase protein, each AAV 
vector integration event is likely to produce a different, incom- 
plete provims, with unknown effects on the host chromosomes 
involved. Many of the integrated vectors could contain dele- 
tions of the transgene being delivered, with resulting decreases 
in transduction efficiencies. As our understanding of the pro- 
cess improves, methods for increasing the efficiency and pre- 
dictability of vector integration that improve the prospects for 
gene therapy by AAV vectors may be developed. 
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A model system using an episomal Epstein-Barr virus shuttle vector was recently developed to study the 
adeno-associated virus (AAV) site-specific integration event in chromosome 19ql33-qter (C. Giraud, £. 
Winocour, and K. I. Berns, Proc. Natl. Acad. Sci. USA 91:10039-10043, 1994). In this study, we analyze the 
recombinant junctions generated after integration of the AAV genome into an Epstein-Barr virus shuttle vector 
carrying 8.2, 1.6, or 0.51 kb of the chromosome 19 preintegration sequence (AAVSl locus). In most of the 
recombinants, one end of the viral genome was joined to a portion of the AAVS1 DNA previously shown to be 
a minimum target for AAV integration. Within this AAVS1 segment, the AAV insertion points were strikingly 
clustered around a binding site for the AAV regulatory protein. In all cases, the second junction with AAV 
occurred with vector DNA outside of the AAVS1 segment. With respect to the viral genome, one junction with 
the shuttle vector DNA occurred either within the AAV inverted terminal repeat (itr), or near the P5 promoter, 
approximately 100 nucleotides distal to a modified itr. The modified itr in 5 of 11 recombinants involved a 
head-to-tail organization. In one such instance, the AAV insert contained slightly more than one genome 
equivalent arranged in a head-to-tail manner with a junction close to the P5 promoter; the AAV insert in this 
recombinant episome could be rescued by adenovirus infection and replicated to virus particles. The signifi- 
cance of the head-to-tail organization is discussed in terms of the possible circularization of AAV DNA before 
or during integration. 



Adeno-associated virus (AAV) has been recognized recently 
as a likely vector for gene delivery in mammalian cells (4, 21, 
31, 43). One particularly attractive feature of this virus is its 
ability to integrate with a high degree of specificity in the 
AAVS1 locus (preintegration sequence) on chromosome 
19ql3.3-qter (23-25, 33, 36). By using an Epstein-Barr virus 
(EBV) shuttle vector (27), we have previously shown that an 
8.2-kb AAVS1 preintegration DNA, propagated as an extra- 
chromosomal episome, is a target for site-specific AAV DNA 
integration (12). Sequential deletion of the 8.2-kb sequence 
identified a minimum sequence of 510 nucleotides (nt) able to 
direct integration. Several signals potentially involved in the 
integration process were identified on this short piece of DNA: 
a Rep 78/68 binding site (6, 42), a potential terminal resolution 
site (38), and an M26 motif (32, 37). This 510-bp fragment was 
also associated with rearrangements of the shuttle vector ge- 
nome (12). 

To characterize the AAV site-specific integration event, we 
have analyzed the products of integration in EBV vectors car- 
rying the AAVSl preintegration locus. Sequencing of the junc- 
tions between AAV and the vector has shown that a highly 
prevalent (in 80% of the recombinants) insertion point is lo- 
cated close to the previously described Rep binding motif and 
terminal resolution sequence at the 5' end of the AAVSl 
preintegration locus. This finding provides further evidence 
underlining the importance of these AAV recognition signals 
in site-specific integration. In addition, the sequencing data 
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have revealed that 5 of the 1 1 recombinant structures analyzed 
contain a head-to-tail junction. 

MATERIALS AND METHODS 

Production and selection of EBV recombinant vectors. The CI 7 cell lines (3) 
propagating the p220.2 EBV shuttle vectors (9, 44) carrying segments of AAVSl 
DNA, designated CI 7-p220.2( AAVSl kb 0-8.2), C17-p220.2( AAVSl kb 0-1.6), 
and CI7-p220.2(AAVSl kb 0-0.51), have been described previously (12). These 
cell lines were infected at several passage levels with AAV at an input multiplicity 
of 20 infectious units per cell. The extrachromosomal DNA was extracted 48 h 
postinfection with a slightly modified Ilirt extraction procedure (18) and trans* 
fected into the SURE strain of Escherichia coli (14; Stratagcnc). The recombi- 
nants were isolated after colony hybridization of a replica filter with a nonradio- 
active single-stranded AAV genomic DNA probe as previously described (12). 

Oligonucleotide probes and hybridization. AAV oligonucleotides (see Fig. I) 
(AAV1, 5 ' -CTCACgTg ACCTCTAATACAgg-3 ' : AAV2, 5 '-gggACCTT AATC 
ACAATCTCg-3'; AAV3, 5^CTTCTCggCCACggTCAgg-3*; AAV4, S'-gAAAT 
gTCCTCCACgggCTg-3'; AAVS, 5 ' -CTTgTCgAgTCCgTTgAAgg-3 ' ; AAV6, 5'- 
CATgAATCCTCTCATCgACC-3*: AAV7, 5'-gTggAgaTCgAgTgggAgC-3'; AA 
V8, 5' -ggCACCAgATACCTgACTCg-3' ; AAV9, 5' -CTAgTTTCCATggCTAC 
g-3'; AAVD, 5'-ggAACCCCTAgTgATggAg-3' ) and AAVSl oligonucleotides 
(oligo 1. 5 ' - ACTTgCTAgTATgCCgTggg-3'; oligo2. 5'CTACCTgCCCAgCAC 
ACC-3'; oligo 3, 5 -CATCCTCTC:CggACATCg-3' ), synthesized by the Oligo 
Etc Company, were labeled with I he Dig-Oligonucleottde 3' end Labeling Kit 
(Boehringer Mannheim). Vector DNA (100 ng) was dotted onto a nylon mem- 
brane (Hybond-N; Amersham), denatured with NaOH at 0.5 M plus NaQ at 1.5 
M, neutralized with 0.5 M Tris-HCl (pH 8.0)-1.5 M NaCI, and UV cross-linked. 
The membranes were subsequently prehybridized for 2 h at 68°C and hybridized 
for 3 h at a temperature determined as optimum for each nucleotide (5 to 20°C 
under the melting temperature). Specific hybridization was detected with the 
Genius system kit (Boehringer Mannheim). 

Sequencing reactions. Plasmid DNA was prepared by the Qiagen kit plasm id 
purification procedure (Qiagen Inc.), and sequencing reactions were performed 
with the Sequenase Quick-Denature plasmid sequencing kit (U.S. Biochemical). 
Oligonucleotides spread throughout the AAV genome (see Fig. 1) were used as 
primers for the sequencing reactions. To determine the junction sequences, an 
additional 15 primers (data available upon request) were utilized. 

PCR for determination of head-to-tail organization of inverted terminal re- 
peat (itr) and adjacent DNA. Plasmid DNA (10 ng) was mixed with 200 pmol 
each of oligonucleotides AAV1 and AAV8. The reaction was performed with the 
Boehringer PCR buffer (final concentrations, 10 mM Tris-HCl [pH 8.4], 50 mM 
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FIG. 1. Pattern of AAV sequences integrated in EBV recombinant vectors. (A) Schematic representation of the AAV genome showing the itr (^), the promoters 
(B), and the oligonucleotides {-*) spanning the AAV internal sequences and the D region of the itr used as probes. The gap indicates a disruption in the sequence 
with respect to the scale of the drawing. (B) Summary of dot blot hybridizations between AAV oligonucleotides and EBV recombinant vectors (indicated by the letter 
R followed by a number) isolated from the C17-p220.2( AAVSI kb 0-8.2), C17-p220.2(AAVSl kb 0-1.6), and C17-p220.2(AAVSl kb 0-0.51) ceil lines 48 h postinfection 
with AAV. The asterisks show the recombinants which were subsequently sequenced at their junctions with the AAV genome (see Fig. 2 to 5). 



KCI, 1.5 mM MgC12, and 0.01% gelatin) supplemented by the four deoxynuclco- 
side triphosphates at a final concentration of 100 each u>M and 2.5 U of Taq 
DNA polymerase (Boehringcr) in a total volume of 100 |il. Thirty cycles (dena- 
t unit ion at 94°C for 45 s t annealing at 58°C for 45 s, and elongation at 72°C for 
1 min) were carried out with a Per kin Elmer Cetus (Norwalk, Conn.) PCR 
apparatus, and 10 pj of the reaction mixture was loaded onto a 1.5% agarose gel. 
Reaction.*: were also performed under the same conditions but with only one of 
the primers. 

Rescue of AAV from recombinant EBV shuttle vectors. C17 cells were infected 
with adenovirus type 2 (Ad 2) at an input multiplicity of 10 infectious units per 
cell, for 1 h at 3TC and subsequently transfected with 20 jig of purified EBV 
recombinant plasmids by using the cation ic lipid A , -(l(2,3-dioleoyloxy)propyl))- 
^JV-trimethylammonium meihyl sulfate (DOTAP) (10, 39). Forty hours later, 
the low-molccular-wcight DNA was isolated (18) and analyzed for AAV DNA 
replication by the Southern blotting procedure. 

To determine AAV capsid antigen synthesis, the cells were harvested 40 h 
after transection, resuspended in radioimmunoprecipitation assay buffer (50 
mM Tris-HCl [pH 7.5], 150 mM NaCl, 1% Triton X-100, 1% deoxycholate, 0.1% 
sodium dodecyl sulfate [SDS]) supplemented with 2 mM phenylmethytsulfonyl 
fluoride and I u,g each of antipain, leupeptin, and pepstatin A, and centrifuged 
briefly at 4*C, and the supernatant fraction was kept at -20*C before analysis on 
an SDS-8% polyacryl amide gel electrophoresis gel. After electrophoresis, the 
polypeptides were transferred to a nitrocellulose membrane and immunodetec- 
tion was performed with polyclonal capsid antibodies (kindly provided by Mertyn 
Malkinson) and a peroxidase label conjugate (Sigma). The chromogenic sub- 
strate 3-amino-9-elhylearbazole (Sigma) was used to detect peroxidase activity. 

To assay for infectious virus production, C17 cells were harvested 40 h post- 
transfection and lysed by several cycles of freeze-thawing. The extract was heated 
for 2 h at 56"C to inactivate adenovirus, clarified by cent rifugat ion, mixed with a 
fresh sample of Ad2, and adsorbed to HeLa cells. At 40 h postinfection, the cells 
were lysed and Ad2 was inactivated as described above. One-tenth of the h/sate 



was treated with DNase I (10 jig/ml) for 30 min at 37°C denatured with 0.8 M 
NaOH, neutralized with 0.8 M Nr^CjHjO* dot blotted to nitrocellulose, and 
hybridized with a "P-labcled AAV DNA probe. 

RESULTS 

A useful feature of the EBV shuttle vector system is that the 
recombinant products can be retrieved in E. colt for structural 
analysis. Accordingly, we isolated EBV-AAV recombinants 
from cell lines that carried p220.2 shuttle vectors with 8.2-, 1.6-, 
and 0.51-kb segments of AAVS1 DNA and had been infected 
at various passage levels. The recombinants were analyzed with 
respect to AAV DNA content, the sequences at the junctions, 
and the capacity of the AAV genome to be rescued from the 
episomal shuttle vector by adenovirus infection. 

AAV DNA in recombinants. To define the AAV sequences 
present on the EBV vectors after infection of the C17-p220.2 
(AAVSI kb 0-8.2), C17-p220.2 (AAVS1 kb 0-1.6), and C17- 
p220.2(AAVSl kb 0-0.51) cell lines, we used a set of AAV 
oligonucleotides spanning the AAV genome to hybridize dot 
blots of the recombinant DNAs (Fig. 1). Some of the recom- 
binants (9 of 43) reacted positively to all of the probes used for 
hybridization, suggesting that the entire AAV genome might 
be present No evidence of the presence of several separate 
inserts of AAY DNA or several copies of the same DNA was 
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Q . Recombinants 
(Junctionnumber) 



R 1(1) 
R2 (2) 



R19 (3) 

R24 (4) 
R27 (5) 
R3I (6) 
R35 (7) 



R33 (6) 

R39 (9) 
R40 (10) 



AAV/A AVS1 sequences 



2S9 



260 



TAAGCCCGA5 [7GAGCI GGATCCTCCC 



AAV AAVSl 

242 251 , „ 424 433 

TGGTCACGCT iGGGl CGGTGCGATG 
AAV AAVSl 



276 2B7 480 469 
AGGGTCTCCA [T| GCCCGGGAGC 



AAV AAVSl 
92 83 419 428 

ACTGAGSCCG EgJgCGGGCGGTG 

AAV AAVSl 
278 287 386 395 
AGGGTCTCCA-C -TTGGGGCTCG 



AAV 
I OS 



AAVSl 
410 



CTCTCCj^ECTCGCTCGCTCIGCTGGGCG 



4589 4596, 
ACTGAGGCCG 



644 635 
GCGCGCAGCA 



4626 4635 408 417 
CCTTTGCCCG TCGCTGGGCG 

AAV AAVSt 

4538 4547 387 396 
GAACCCCTAG TGGGGCTCGG 
AAV AAVSl 
741 749 409 417 

GCCGGAGGC-AC GTTGGT C -CGCTGGGC G 



AAV 



AAY5I 



AAV 

(4.68 kt>) 



l| lit 10(749) 



(273) 



(288) 
(287) 



c. 



AAVSl 

(8 2 m 



9 7 8 
(4547X4602X4635) 




n26mot.f pU ^ K / „ 
c/»e I .Rep btndtnfsiu 



277 




FIG. 2. Junctioas between AAV and AAVSl sequences, (a) Sequences at the nucleotide level Boxed sequences at the junctions are common to both viral and 
vector genomes. A junction number (in parentheses) was given to each recombinant, and the same number is used in panels b and c. The horizontal dotted lines separate 
the three groups of recombinants isolated from three different cell lines (Fig. 1). (b) Schematic representation of the junctions with respect to the AAV genome, (c) 
Schematic representation of the junctions with respect to the AAVSl segment of chromosome 19. The AAVSl sequence (8.2 kb) shows a CpG island (^), a region 
corresponding to a partial cDNA clone (^), and a minisatellite repetitive DNA sequence (CZI). The enlargement of the first 510 nl at the 5' end of AAVSl shows 
the M26 motif from Sacchnromycts cerevkiac. a putative terminal resolution site (try), and a Rep binding site. CRE, cyclic Amp response element. 



found by restriction digestion (data not shown). Most of the 
recombinants (32 of 43) had deletions in the rep genes; the 
deletions extended to the capsid genes in at least 20 of these 
cases. Thirty-six of forty-three recombinants reacted positively 
to the oligonucleotide AAVD located in the D region of the 
itr. The D region of the AAV itr is outside of the 125-bp 
symmetrical sequences able to fold into a T-shaped structure. 
Hence, the itr, or at least part of it, is present in most of the 
recombinants. The pattern of the AAV sequences found in 
each recombinant (Fig. 1) was used to select AAV oligonucle- 
otides for sequencing of the junctions between the viral and 
vector DNA segments. 

AAV and vector sequences at recombinant junctions. The 
junctions of 11 recombinants (noted by asterisks in Fig. 1) were 
sequenced by using multiple oligonucleotides as primers (see 
Material and Methods and Fig. 2 to 5). 

(i) Junctions between AAV and AAVSl sequences. A junc- 
tion between AAV DNA and the AAVSl DNA segment of the 
p220.2 vector was found in all of the recombinants analyzed 
but one (R23; Fig. 2). Surprisingly, a second junction between 
these two DNAs was never found in the same recombinant. In 
AAV, the junctions were localized in the itr or near the P5 
promoter (Fig. 2a and b). In one case (R40), the junction was 
found in the rep gene sequence. 

In 80% of the cases, the junctions in the AAVSl sequence 
were found in the 5' 510-bp segment previously defined as the 
minimum chromosome 19 DNA sequence able to direct AAV 
integration (12). Strikingly, the junctions within the 510-bp 
segment were tightly clustered around the AAVSl Rep bind- 
ing site (Fig. 2a and c). Only two junctions were found 3' to the 
510-bp segment (in Rl and R35). Because the AAVSl DNA at 
the junction was found to be oriented towards the 3' end of 



AAVSl (R35 is the sole exception; Fig. 2) and because a 
second junction with AAVSl DNA was never found in the 
same recombinant DNA molecule, we wished to determine if 
AAVSl sequences 5' to the insertion might have been prefer- 
entially lost during integration. Therefore, we used three 
AAVSl oligonucleotides (nt 60 to 79, 248 to 265, and 453 to 
435) to screen by dot blot hybridization for the presence of 
AAVSl DNA 5' to the junction with AAV DNA. The 5' 
AAVSl DNA sequences were found in 7 of 11 cases (Table 1). 
Thus, although AAVSl DNA 5' to the AAV junction was 
present in 64% of the recombinant vectors, this DNA segment 



TABLE 1. AAVSl sequences from the 5' 510-bp fragment present 
in EBV recombinant vectors" 

Recombinant 



Hybridization with AAVSl oligonucleotide: 



in AAVSl 
sequence [nt]) 


1 (nt 60-79) 


2 (nt 248-265) 


3 (nt 453-435) 


Rl (1600) 








R2 (421) 






+ 


R19 (479) 


+ 


+ 


+ 


R23 (?) 


+ 


+ 


+ 


R24 (417) 


+ 


+ 


+ 


R27 (386) 






+ 


R31 (398) 






+ 


R35 (648) 


+ 


+ 




R38 (408) 


+ 


+ 


+ 


R39 (387) 


+ 


+ 


+ 


R40 (409) 


+ 


+ 


+ 



• The presence of sequences from the 5' end of the preintegration locus was 
assessed by dot blot hybridization with oligonucleotide probes. 
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Recombinants 
(Junctionnumber) 



R I (0 
R 2 (2) 



R23 (II) 
R27 (5) 
R35 (7) 



R38 (8) 
R39 (9) 
R40 (10) 



AAV/p220.2 sequences 



2332 2323 4480 4471 
CCCTGCT6TC AGACTCCTGC 

AAV P220.2 
5676 3071 227 22 

CATGAC- GTTCCATCTNACMGATGTM- CTGGGCA 
P220.2 



AAV 



164 IS5 <*948 4939 

GTCACGACTC' GT G A - CTCCTCCTGC 
AAV P220.2 

31 17 3106 521 S30 
TCCCCAGTTG CGTTTATGAA 
AAV P220.2 
3180 3172 4666 4656 

ATTCTGCGT-ACGACGTGATTAAG-GTCCATGGT 



P220.2 



4312 4303,—. 70 12 6999 
GTTGGAAGTG [TAI TCCTAATTTA 

AAV p220.2 
3028 3819 787 796 

AAAGATGAGA^cCCl TGC6CCCAAG 

AAV P220-2 
89 00 8104 8113 

GAGGCCGCCC CTAGTATTTA 

AAV P220.2 



^ & 

to it 

(80)(155) 



c. 



AAVS1 

Sequence 



P220.2 
(0.95 kb) 



(2323) 



3) (3 1 0*6) 



FT 



2 e 

(3b71) (4301) 

7 r 

(3172) 9 

(38 15) 



HSV Ik 
ISV tk Promoter 
jofyi a hygR Cot E 1 ort om^ . 



7TT 



2 5 9(783) 
(227)(S2I) 



1 

(4480) 



1 1 

(4948) 



8 10 

(7010KBI04) 



(4666) 



FIG. 3. Junctions between AAV and p220.2 sequences outside of the AAVS1 segment, (a) Sequences at the nucleotide level. Boxed sequences at the junctions are 
common to both viral and vector genomes. A junction number (in parentheses) was given to each recombinant, and the same number is used in panels b and c. The 
horizontal dotted lines separate the three groups of recombinants isolated from three different cell lines (Fig. 1). (b) Schematic representation of the junctions with 
respect to the AAV genome, (c) Schematic representation of the junctions with respect to the p220.2 plasmid genome. The 8.9-kb p220.2 plasmid contains the herpes 
simplex virus (HSV) thymidine kinase (tk) promoter and polyadenylation signal (poly a) flanking a hygromycin resistance-encoding gene (hygR), the EBV latent origin 
of replication (oriP), the BBV-encodcd nuclear antigen EBNA-1, and E. coli sequences (Col El and ori amp R ). 



was apparently translocated to other positions in the vector 
genome. Whether the rearrangements of AAVS1 DNA oc- 
curred during AAV integration or as a consequence of vector 
rearrangements which have been observed to occur prior to 
AAV infection (12) is not known, although the fact that the 
rearrangement always occurs immediately 5' to the junction 
site suggests the former possibility. 

Inspection of the AAV-AAVS1 junction sequences (Fig. 2a) 
revealed only limited patchy nucleotide sequence homology, 
indicating that integration into the vector occurs by nonho- 
mologous recombination. 

(ii) Junctions between AAV DNA and p220.2 vector DNA 
outside of the AAVS1 segment. As noted above, the second 
junction between AAV DNA and the vector genome occurred 
outside of the AAVS1 segment in each recombinant examined. 
With respect to the AAV genome, these junctions were located 
in the itr or in the cap gene (in the latter case, substantial 
portions of the rep and capsid genes were deleted in the re- 
combinant genome) (Fig. 3a and b). With respect to the p220.2 
vector, the junctions were found in the EBV oriP sequence, in 
the EBV EBNA-1 gene, and in the hygromycin resistance- 
encoding gene (Fig. 3a and c). Despite the crucial role of these 
vector elements for stable episome propagation in animal cells 
(27), the period of 48 h between AAV infection and recovery 
of the vectors in bacteria was probably too short for the cells to 
lose the plasmid by dilution. As expected, no junctions were 
found in the E. coli elements necessary for selection in bacteria 
by virtue of the screening procedure. 

(iii) Head-to-tail junctions between AAV sequences. When 
primers located near the left and right itrs (AAV1 and AAV9; 
Fig. 1A) were used to sequence the junctions, it became evi- 
dent that the itr and adjacent DNA were sometimes organized 
in a head-to-tail fashion, relative to the wild-type AAV genome 
(recombinants Rl, R2, R19, R27, and R31 in Fig. 4). In the 
wild-type AAV genome, the first 125 nt of the 145-nt itr can 
form a T-shaped hairpin structure because of the presence of 
two small internal palindromes (B-B' and C-C) flanked by a 



larger palindrome (A-A') (1). In addition, the complete itr 
element includes a single-stranded D region located outside of 
the palindromic hairpin. The head-to-tail AAV-AAV recom- 
binant junctions are composed of one complete itr (D-A'-B'- 
B-C'-C-A) expanded by a 20-nt D' region (D-A'-B'-B-C'-C-A- 
D'). Thus, in the recombinants with the head-to-tail orienta- 
tion, the 3' end of the AAV genome is joined to nt 126 at the 
5' end (Fig. 4). The Rl, R2, R19, and R27 recombinants which 
display the unusual head-to-tail organization of the itr and 
adjacent DNA are also characterized by two additional com- 
mon features; the junction with AAVS1 occurs near the P5 
promoter, and a substantial segment of the AAV internal se- 
quences (the cap and rep genes) is deleted (Fig. 2; see Fig. 6A). 
The fifth recombinant structure, which displays the head-to-tail 
organization of the itr and adjacent DNA at one end (R31), 
contains the internal AAV sequences and is linked to AAVS1 
DNA via a disrupted itr at the other end (Fig. 6C); the AAV 
genome integrated into the R31 recombinant vector was res- 
cued by adenovirus infection (see below). 



146 154 
i GGAGGGGTG 



4527 4535 

TTAACTACA |D .. A . a _ B ._ c _ t: ._ A . | 



AAVgenome 



Head to tall AAV/AAV junction In Rl, R2, R19, R27, R31 



4527 4535 



146 154 

TTAACTACAlD-A -B -B-C -C*A'01g6ACGGGTG 

IP-a-c-c-h-FvpI 

H-T 

FIG. 4. Junctions between AAV sequences. The sequences D-A*-B'-B-C'-C- 
A-D' and D-A'-C'-C-B'-B-A-D* refer to the head-to-tail (H-T) organization of 
the recombinant AAV itr relative to the wild-type itr. The gap indicates a 
disruption in the sequence with respect to the scale of the drawing. 
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Recombinants 
(Junctionnumber) 


AAV/unknownsequences 


R19 (3) 
R23 (1 1) 
R24 (4) 
R31 (6) 


U6t 1332 
TATGGCCTCC ? 

16 1 2 4603 
GACCTT7GGT ? 

4586 4595 
CTCACTGAGG ? 

278 287 
AGGGTCTCCA ? 



A . (Rl.R2.RI9.R27) 



AAV 

(4.68 kb) 



MB — 0— i — 



t 



t 



6 

(287) 



3 

(1352) 



4 t I 

(4595X4603) 



FIG. 5. Junctions between AAV and nonviral-nonvector sequences, (a) Se- 
quences at the nucleotide level. A junction number (in parentheses) was given to 
each recombinant, and the same number is used in panel b. (b) Schematic 
representation of the junctions with respect to the AAV sequence. 



The head-lo-tail organizalion of AAV DNA in recombinant 
Rl, R2, R19, R27, and R31 DNAs was confirmed by PCRs 
with oligonucleotides AAV1 and AAV8 (Fig. 1A) as primers. 
The product was 230 nt long as judged by gel electrophoresis 
and hybridized with the AAV9 oligonucleotide (Fig. 1), thus 
confirming the presence of itr DNA. 

When oligonucleotide primers AAV1 and AAV8 were used 
independently in PCRs, no products were detected, indicating 
that the organization of these AAV-AAV recombinant junc- 
tions is not head lo head or tail to tail. The recombinants 
showing head-to-tail organization of the AAV itr and adjacent 
DNA were isolated in independent experiments from two dif- 
ferent infected cell lines [Rl and R2 from C17-p220.2(AAVSl 
kb 0-8.2) cells and R19 and R27 from C17-p220.2(AAVSl kb 
0-1.6) cells] (Fig. 1). The similarity of the head-to-tail organi- 
zation of AAV DNA in each case is striking and suggests a 
common recombination-integration mechanism. The possibil- 
ity that this involves circularization of AAV DNA prior to or 
during integration is discussed below. 

(iv) Junctions between AAV DNA and DNA unrelated to 
vector or viral DNA. DNA unrelated to the vector or viral 
genomes was found at the junctions with AAV DNA in four 
recombinants (R19, R23, R24, and R31; Fig. 5). The unrelated 
sequence in R24 shows some homology with the DNA of the 
human Line-1 element (long interspersed repetitive sequence 
derived from a rctrotransposon) (19). An oligonucleotide pre- 
pared from the R24 unrelated sequence hybridized with the 
DNAs of the R19 and R23 recombinants (but not with R31 
DNA). All of the recombinants with nonviral-nonvector DNA, 
related to human Line-1 DNA, were derived from a single 
infected cell line propagating the p220.2(AAVSl 0-1.6 kb) 
episome. One vector isolate of this cell line which did not 
contain AAV DNA also hybridized with the oligonucleotide 
derived from the R24 nonviral-nonvector segment. These re- 
sults suggest that some p220.2(AAVSl 0-1.6 kb) episomes ac- 
quired the human Line-1 DNA prior to recombination with 
AAV DNA The human Line-1 element is known to be capable 
of existing as part of an extrachromosomal circular element 
(20). 

Summary of recombinant junctions. On the basis of the 
sequencing data from 11 recombinants, the junctions can be 
grouped as shown in Fig. 6. In group A, one junction with the 
episome joins AAV DNA from around the P5 promoter to the 



Junction! In the cop or rep genes either with 
P220.2 plojmld or on unknown toque nee 



B . (R24) 

JB EB- 



Junction to the itr 
«fth AAV SI sequence 



Junctions ground the PS promoter 
with AAVSI sequence 



-F&0§—b- 



Junction in the ftr with 
unknown sequent a 



C . («») 



M E3- 

Junction in the itr with 
AAVS1 sequence 



0, <R23. R35.R3B. R39. R«0) 



-Ex- 



junction close to 
the p5 promoter with 
an unknown sequence 



Junctions either In the eopsld genes 
or close to the Itr with 0220.2 plosmtd 



Junction In the itr with AAVSI 
sequence or an unknown sequence 



Junction in the itr with 
D220.2 plosmtd 

Junctions In the rep genes 
with AAVSI sequences 



FIG. 6. Patterns of AAV inserts and junctions in EBV recombinant vectors. 
W, head-to-tail (H-T) organization of the itr (sec the text for details) and 
adjacent DNA; V, disrupted itrs. 



episomal AAVSI segment; the second junction occurs either 
between a p220.2 sequence (that is outside the AAVSI seg- 
ment) and the AAV capsid genes (Rl, R2, and R27) or be- 
tween an AAV rep gene sequence and nonviral-nonvector 
DNA (probably Line-1 DNA) (R19). All of the recombinants 
in group A show head-to-tail organization of the AAV itr and 
adjacent sequences, and much of the internal AAV sequences 
are deleted. In group B (R24), all of the internal AAV se- 
quences appear to be present; one junction with AAVSI is via 
a disrupted itr element; the second junction with human Line-1 
DNA in the episome also occurs via a disrupted itr element. In 
group C (R31), the organization of the AAV insert is similar to 
that of group B except that one junction shows the head-to-tail 
organization of the viral itr and adjacent DNA and connects 
with episomal nonviral-nonvector DNA at a point near the P5 
promoter. The AAV insert in R31 was rescuable by adenovirus 
infection (see below). Group D (R23, R35, R38, R39, and 
R40) is characterized by the presence of a segment of AAV 
DNA and a disrupted itr joined to either AAVSI, p220.2, or 
the nonviral-nonvector DNA. 

Rescue of AAV from EBV recombinant vectors. It was of 
interest to determine if the AAV DNA in the recombinants 
which reacted with all of the virus-specific oligonucleotide 
probes (R4, R5, R9, R17, R23, R24, R26, R29, and R31 in Fig. 
1) could be rescued by infection with adenovirus. Accordingly, 
C17 cells were first infected with Ad2 and then transfected with 
the plasmid-purified shuttle vector- AAV recombinants noted 
above. At 40 h posttransfection-infection, expression of the 
AAV capsid genes, rescue and replication of the AAV DNA, 
and production of infectious particles from these EBV recom- 
binant vectors were assessed, 

AAV capsid gene expression was determined by Western 
blot (immunoblot) analysis with a polyclonal AAV capsid an- 
tibody (Fig. 7). R31 and R26 (data not shown) expressed the 
capsid proteins at a level similar to that seen with the pSM620 
plasmid (34), suggesting that the AAV DNA is rescued from 
the EBV recombinant vectors and replicated. A low expression 
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FIG. 7. Capsid gene expression of AAV rescued from EBV recombinant 
vectors. Immunoblot analysis of CI 7 cells infected with Ad2 and transfected with 
plasmid p AAV/ Ad (35) (lane 3), plasm id pSM620 (34) (lane 4), recombinant 
R31 (lane 5), recombinant R9 (lane 6), recombinant R24 (lane 7), or recombi- 
nant R23 (lane 8). Lanes 1 and 9 contained, respectively, a Rainbow molecular 
weight marker {Amcrsham) and a prestaincd SDS-polyacrylamide gel electro- 
phoresis standard high-range marker (Bio-Rad). The molecular masses of pro- 
teins VP1, VP2, and VP3 are 87, 73, and 62 kDa. Lane 2 contained a mock- 
transfected control. The antiserum used in the immu nob lots was a polyclonal 
antj-AAV capsid serum. 



level of the VP polypeptides was seen with some of the other 
recombinants (Fig. 7). 

To assess rescue and DNA replication, the low-molecular- 
weight DNA was extracted from the transfected-infected CI 7 
cells and analyzed on a 0.7% agarose gel (Fig. 8A). Only the 
extracts from cells transfected with recombinants R31 and R26 
(data not shown) contained detectable amounts of the AAV 
monomer and dimer forms also seen when AAV DNA is 
rescued from the pSM620 plasmid. These products were resis- 



123456 123456 




FIG. 8. Rescue and replication of AAV DNA from EBV recombinant vec- 
tors. Extrachromosomal DNA was extracted from CI 7 cells infected with Ad2 
and transfected with recombinant R9 (lane 2), recombinant R31 (lane 3), or 
plasmid p5M620 (lane 4) or mock transfected (lane 5). Lanes 1 and 6 contained 
DNA molecular weight markers III and VII (Boeh ringer Mannheim) (A) Aga- 
rose (0.7%) gel stained with ethidium bromide. (B) Southern blot hybridization 
with an AAV probe. Abbreviations: m, monomer, d, dimer, Ad, adenovirus. 



J. VlROl- 

tant to Dpnl digestion (data not shown). Hybridization with an 
AAV probe (Fig. 8B) confirmed the nature of the DNA. 

Production of infectious particles was assessed by adsorbing 
the lysates of transfected-infected C17 cells to adenovirus- 
infectcd HeLa cells. At 40 h postinfection, the presence of 
progeny AAV in the HeLa cells was determined by dot blot 
hybridization (AAV probe) of HeLa cell lysates treated first 
with DNase I (to screen for DNase-resistant particles) and 
then with NaOH to extract and denature the DNA. Two re- 
combinants (R26 and R31) gave rise to progeny AAV in the 
above-described procedure. Both R26 and R31 contain the 
head-to-tail organization of the itr and adjacent sequences. 
Thus, despite this divergence from the wild-type organization 
of the AAV genome, the AAV inserts in R26 and R31 could be 
rescued and replicated to infectious particles. 

DISCUSSION 

AAV is unique among animal viruses in its capacity to un- 
dergo site-specific chromosomal integration. To aid the inves- 
tigation of the targetting mechanism, we previously developed 
an EBV-based shuttle vector system in which the chromosome 
19 AAVS1 preintegration DNA propagates as an episome into 
which AAV can integrate (12). The objective of the present 
experiments was to define the structure of the recombinant 
junctions in episomal integration and to compare such junc- 
tions with AAV insertions into AAVS1 when it is part of the 
intact chromosome. 

Although head-to-tail and tail-to-tail arrangements in chro- 
mosomal inserts containing multiple copies of the AAV ge- 
nome have been reported previously (5, 22, 26, 30, 45), the 
finding of head-to-tail organization of the AAV itr and adja- 
cent sequences in 5 of 11 recombinants sequenced (Fig. 6A and 
C) was not anticipated because only 1 of these contained an 
insert of greater than unit length. In these recombinants, the itr 
was modified by addition of a 20-nt D region and linked the 3' 
end of the AAV genome to AAV nt 126 at the 5' end. In all 
five of these recombinant junctions, the crossover point with 
the vector occurred at or slightly downstream of the P5 pro- 
moter at AAV nt 250 (Fig. 2b). Although the retention of viral 
DNA in this group of independently isolated recombinants 
varied from a rescuable, complete genome equivalent in R31 
to extensive deletions in the rep and cap genes in Rl, R2, R19, 
and R27, the structure of the head-to-tail modification was 
remarkably similar in each case. Multiple copies of the AAV 
genome in tandem array could occur in an insert as the con- 
sequence of either recombination or replication of the inserted 
sequence before or during integration. Recombination via the 
itr could result in either head-to-tail, tail-to-tail, or head-to- 
head orientation of adjacent copies of the genome. The former 
two possibilities have been reported (5, 16, 22, 26, 30, 45). 
Although the predominant frequency of head-to-tail orienta- 
tion in chromosomal integration suggests the likelihood that a 
rolling-circle form of replication (11) is involved; even in this 
event, the first step would be formation of a circular interme- 
diate by recombination via the itr. Integration of less than a full 
genome equivalent of any of the recombinants characterized in 
this report does not require replication but does suggest that a 
circular molecule is an intermediate in the process. As dis- 
cussed elsewhere, (31), if such integrated forms do arise by 
replication, the mode of replication must differ from that of 
AAV during lytic infection, which gives rise only to head-to- 
head or tail-to-tail intermediates (1, 17, 40). The proposal that 
viral genomes may undergo an alternative mode of replication, 
quite different from that characteristic of replication during 
lytic infection, is not without precedent. Aberrant rolling-circle 
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replication intermediates — with high recombinogenic poten- 
tial — have been described for papovavirus DNA (2, 7, 15), 
which normally replicates bidirectionally, giving rise to theta- 
type intermediates during lytic infection (41). Indeed, rolling- 
circle replication intermediates have been proposed as inter- 
mediates in papovavirus integration (8). 

Previously we have shown that a 510-bp segment at the 5' 
end of AAVS1 that has been cloned (23) is capable of directing 
site-specific integration of AAV into the EBV-based episomal 
vector (12). Likely recognition signals that might target the 
AAV genome to AAVS1 are the Rep binding site at AAVS1 nt 
398 to 413 (6, 42), a potential terminal resolution site (38) at nt 
384 to 389, and the yeast recombinogenic M26 motif (32, 37) at 
nt 277 to 283. Our analysis of the recombinant junctions in the 
cpisomc has shown that in 80% of the cases examined, the 
AAV insertion points are tightly clustered around the Rep 
binding site (AAVS1 nt 386 to 479; Fig. 2c). However, two 
junctions with AAVS1 DNA both 3' to the 510-bp segment 
(AAVS1 nt 648 and 1600), were also detected. The tight clus- 
tering of a major proportion of crossover points around the 
potential terminal resolution site and Rep binding motif cer- 
tainly underlines the importance of these signals in the target- 
ting mechanism. 

In latently infected cell lines, the few AAV insertion points 
that have been determined were located at AAVS1 nt 1026 to 
1030 and 1144 to 1146 (23), in a region from nt 713 to 1303 (36, 
45), and at position 727 (13). These insertion points in AAVS1, 
when it is part of intact chromosome 19, are somewhat 3' to 
the tight cluster around positions 386 to 479 when AAVS1 is 
propagated in an episomal vector. The reason for this differ- 
ence is not known. It should be noted, however, that mapping 
of AAVS1 recombinant junctions in latently infected cells can 
be performed only after many cell generations of growth sub- 
sequent to the initial integration event. In contrast, the recom- 
binants generated in the EBV episomal system are not subject 
to this constraint. 

Apart from the difference in the distance of the insertion 
points from the Rep binding motif, noted above, integration of 
AAV into AAVS1 in the manipulalable EBV-based episomal 
system shows many points of similarity to AAV integration into 
AAVS1 as part of intact chromosome 19. A genome equivalent 
of the viral DNA can be rescued by adenovirus superinfection 
in both cases. The AAVS1 target is substantially disrupted and 
rearranged both in the episome and in the chromosome. In the 
episome, AAVS1 sequences 5' to the viral insertion are no 
longer present at the junction, having apparently been trans- 
located to other vector positions. In the chromosome, disrup- 
tion of AAVS1 is severe enough that it becomes diagnostic for 
AAV site-specific integration (25). The head-to-tail organiza- 
tion of the viral itr and adjacent DNA can also be encountered 
in both chromosomal integration (45) and episomal integra- 
tion. Similarly, viral recombinant junctions localized to the P5 
promoter region (which also contains a Rep binding sequence; 
28, 29) have been noted in both systems (45). Thus, the highly 
manipulatable EBV-based episomal system reflects many as- 
pects of AAV integration at the intact-chromosomal level. 
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